Skip to main content

YoBulk AI

Pre-requisite before exploring YoBulk AI usecases:

YoBulk is bullish at solving the first mile data onboarding problems through AI.in this document, we have highlighted some real use cases of CSV importing where AI and ML models can make a significant impact .

first mile

AI & ML CSV importing Usecases

  • ML models learns over time how data from CSV will organise.User can label and train the models from the examples and expected output CSVs.

  • The AI approach will replace the rule based validations.ML models can do both syntactic checks (e.g. validating Telephone numbers in a specific format ), but also semantic checks (which allows us to check numbers are within a range).

  • To reduce the risk of creating a "Garbage In, Garbage Out" model, automated data cleansing plays in an instrumental role in data onboarding and further analysis of flatfile data.

Auto-Recommendation

ML model which remembers the cleaning process by a specific user and provide an auto recommendation during future import process. usecases1

Auto-Column matching:

ML model to recognise Data type based on the column name during template creation.

Column matching is a very important feature where YoBulk tries to match the CSV columns with template columns.A ML model can help to do an accurate fuzzy matching and can even suggest a match by doing a text classification and deriving a context of datatype.

YoBulk AI auto column matching:

matching

YoBulk AI Schema generation:

Schema with regexs are core part of data hooks used heavily in YoBulk.Regular expressions have their own syntax and rules, which can be difficult to understand at first. They can also be very dense and compact, making it hard to see the overall pattern.For example, you could use a regular expression to ensure that a given email address is in the correct format and has the correct domain name before accepting it as valid input.This regular expression can be used to match and validate email addresses. It uses a combination of letters, numbers, and special characters to define a pattern that can be matched against email addresses.

YoBulk provides a NLP based interface which can be used to generate YoBulk template schemas by using natural english language.

AI image

ML modeling with cell linkage and context

Identifying and removing outliers at cell level is the most important part of validation.A context and linkage has to be attached at each cell level.The model has to understand beyond a rule based validation.Ex:Same address can not have two different zip codes across two records in same CSV file.A standard validation rule at a column or field level can not catch this problem. ML MODEL